DOCUHEHT BBSUHE 



ED 128 462 



TH 005 655 



AOTHOB 
TITLE 

PUB DATE 
HOTE 



BDBS PBICE 
DESCPIPTOBS 



IDEHTIFIEBS 



ABSTBACT 



Kohr, Blchard L. 

An Evaluation of a Hultiple Matrix Sanpling Procedure 
for a State Assessaent Frograa. 
[Apr 76] 

21p«; Paper presented at the Annual Heeting of the 
National Council on Heasureaent in Education (San 
Francisco, California, April 19-23, 1976) 

BF-$0.83 HC-$1.67 Plus Postage. 

♦Educational Assessaent; Eleaentar; Secondary 

Education; *Itea Saapling; Heasureaent Techniques; 

Siaulation; *State Prograas; *Testing Prograas; Test 

Beliability 

Multiple Matrix Saapling; ^Pennsylvania Educational 
Quality Assessaent 



Pennsylvania's Educational Quality Assessaent Prograa 
provides each participating school with a building level report in 
which state percentiles are a proainent part. Hultiple aatrix 
saapling vas being coL;idered as a technique to reduce testing tiae. 
However, there was great concern that the error associated with 
estiaating the school aean aight lead to aarkedly different 
percentiles than obtained by census testing. Generally favorable 
results are reported froa a post aortea siaulation of aultiple aatrix 
saapling for a 2 to 6 sobtest/subgroup saapling plan involving data 
obtained froa over 30,000 students in 500 elementary schools. 
(Author/BH) 



* Docuaents acquired by EfilC include aany inforaal unpublished ♦ 

* aaterials not available froa other sources. EBIC aakes every effort ♦ 

* to obtain the best copy available. Nevertheless, iteas of aarginal * 

* reproducibility are often encountered and this affects the quality ♦ 

* of the aicrofiche and hardcopy reproductions EBIC aakes available ♦ 

* via the EBIC Docuaent Beproduction Service (EDBS) . EDBS is not ♦ 

* responsible for the quality of the original docuaent. Beproduct ions ♦ 

* supplied by EDBS are the best that can be aade froa the original. ♦ 



i(\3 
CM 



An Evaluation of a Multiple Matrix Sampling 
?rocedure for a State Assessment Program 



Richard L. Kohr 
Pennsylvania Department of Education 



0,5. OEPARTMEMTOf MEALTM. 
EDUCATION & WELFARE 
NATIOKAL IMSTITUTE OP 
eOUCATlON 

mis DOCUMENT MAS BEEN REPRO- 
DUCEO EXACTLY AS RECEIVED PROM 
THE PERSON OR ORGANIZATION ORIGIN- 
ATING IT POINTS OP VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SENT OPFICIAL NATIONAL INSTITUTE OP 
EDUCATION POSITION OR POLICY 



Presented at the Annual Meeting of the 
jj^ National Council for Measurement in Education 

San Francisco, California 

April 1976 



0 



An Evaluation of a Multiple Matrix Sampling 
Procedure for a State Assessment Program 

Richard L. Kohr 
Pennsylvania Departnaent of Education 

Pennsylvania's Educational Quality Assessment Program provides each 
participating school with a building level report in which state percentiles 
are a promlnant part. Multiple matrix sampling was being considered as a 
technique to reduce testing time. However, there was great concern that thvS 
error associated with estimating the school mean might lead to markedly 
different percentiles than obtained by census testing. Reported are gen- 
erally favorable results from a post mortem simulation of multiple ma.trix 
sampling for a 2 to 6 subtest/subgroup sampling plan involving data obtained 
from over 30,000 students in 500 elementary schools. 



An Evaluation of a Multiple Matrix Sampling 
Ptx)cedure for a State Assessment Program 

Much of the recent literature on multiple matrix sampling has dealt with 
theoretical aspects of parameter estimation. Various studies suggest ways 
to optimize estimation under specified restrictions, but only a few investi- 
gations have dealt with practical considerations. When one considers the 
application of multiple matrix sampling to a situation in which, not one, 
but a battery of instruments are to be given to students, the problem becomes 
increasingly complex, especially when the instruments vary in size and where 
some are cognitive and others affective. This circumstance exists in the 
Pennsylvania Department of Education's (PDE) assessment program. Since 1969 
schools have been assessed on each of 10 state adopted goals (PDE, 1973). 
When a school underwent assessment, all of the students took each of the 11 
or 12 instruments in the battery, a process which required about 4 hours of 
testing time. During a recent review of the Pennsylvania Educational Quality 
Assessment (EQA) program, advisory committees recommended an enlargement of 
content coverage in a number of areas. The suggested changes would require 
several new instruments as well as an increase in the number of items in 
various other instruments. The inevitable result of instrument expansion is 
an increase of student testing time to a degree that, in this instance, was 
judged to be beyond tolerable limits. 

Thus, multiple matrix sampling was brought under consideration as a 
potential time-saving technique. Partitioning instruments into several sub- 
tests of non- over lapping items. could result in a substantial reduction of 
testing time while simultaneously permitting an extension of content coverage. 

Of immediate concern for planning was the question of the number of 
subtests to be employed. Given an estimate of the probable number of items 



in the final battery and the desire to reduce testing time to approximately 
two hours, it was determined that about four subtests would be required. A 
number of people voiceJ concern over the amount of error that would be 
introduced by the procedure. For e given number of subtests, the amount of 
error in estimating a school mean might be well within tolerable limits for 
one test but beyond the acceptable range for another. An additional concern 
revolves around the question of how much error is tolerable. A determina- 
tion of tolerable error will also influence a decision regarding the number 
of subtests to have. How can the question be translated into terras meaning- 
ful to administrators of assessment programs? 

In the EQA prograjn> mean test scores are produced for each building. 
Included in a school report (PDE, 1974) is the state percentile rank attained 
by that school on each test. Hence, a major concern was whether a school 
mean, as estimated by multiple matrix sampling, would place the school at 
approximately the same percentile as the mean score attained by census 
testing. Thus, one approach to evaluating the effect of matrix sampling is 
in terms of the difference In percentile rank achieved by these two methods. 
For example, suppose an uncomfortably high percentage of schools deviated 
from their "true" placement by more than, say, 10 percentile points when 
multiple matrix sampling was applied. EQA staff was concerned that such a 
circumstance would greatly hinder the believabillty of the assessment report 
by school people who are accustomed to receiving information based on census 
testing. This concern is especially acute in low scoring schools where there 
is a greater tendency for a worried administration to attempt to discredit 
the report by claiming the results contain too much error to be trustworthy. 
To get a picture of the amount of error the EQA program would have to live 
with under various sampling plans, a simulation of multiple matrix sampling 
was conducted in the Fall of 1975. 5 

-2- 



Method 

Instruments 

Instruments in the EQA package range from 28 items to 63 items, depen- 
ding on grade level. Consideration was given to multiple matrix sampling 
plans with two to six subtests equally balanced with respect to items from 
each instrument. The instruments selected for the simulation included both 
cognitive and non-cognitive measures and are more fully described in the 
EQA technical manual (PDE, 1975). A 40 item self esteem scale having a four 
choice Likert format represented non-cognitive area. This scale, similar 
to the Coopersmith (1967) Self Esteem Inventory, is internally consistent 
(Coefficient Alpha of ,88, N = 3400) with item means ranging from 1.29 to 
2.30 and an average item mean of 1,72 where item values range from 0 to 3. 
Cognitive measures included a 30 item verbal analogies test and a 60 item 
composite achievement consisting of verbal analogies and mathematics 
reasoning. Internal consistency reliability estimates for verbal analogies 
wa£. .83 and mathematics reasoning ,79. Difficulty level ranged from .19 to 
.95 with an average of .60 for verbal analogies and .16 to .95 with an 
average of .63 for composite achievement. These instruments were selected 
for the simulation exercise since they were each capable of subdivisions into 
at least three matrix sampling plans. For example, a 40 item scale may be 
divided into two 20 item subtests, four 10 item subtests and five 8 item 
subtests. Using instruments with 30, 40 and 60 items for the simulation 
exercise should provide a reasonable picture of what to expect in practice 
since they represent the range of instrument sizes found in the battery. 
Since the most severe estimation problems occur for elementary Mhools, 
which generally have much smaller enrollments than secondary schools, the 5th 
grade data base was chosen. g 

-3- 



Procedure 

The procedure followed that of .he typical post mortem simulation in 
which data, originally collected by having students answer all the items, 
is later acted upon as if the students had taken different subsets of Items. 
In multiple matrix sampling, a universe of K items has been partitioned xnto 
T subtests of k items. The population of N students is randomly divided 
Into T subgroups of n students. Each subgroup of y.tudents takes a different 
subtest. An estimate of the parameter of interest is computed for each 
subgroup and a linear combination is obtained for the population estimate. 
In the current situation we are interested in estimating the mean test score 
for a school for hypothetical 2-subtest/2-subgroup to 6- subtest/6- subgroup 
cases. Since data have already been collected via census testing each stu- 
dent has responded to all items,. Random sampling of items into subtests is 
frequently used in Monte Carlo studies; however, the literature contains some 
suggestions regarding the advl,sability of assigning items, according to a 
stratification based on item charact rlstics such as difficulty level. The 
present study attempted to create an optimal item assignment and an adverse 
assignment condition. Using item analysis infoncation from a data base of 
about 3400 cases, the items were first^ rank ordered with respect to item 
mean. A matched condition was created by alternately assigning items from 
the ordered list to a subscale in the two subscale case. In the five subscale 
case, items ranked I, 6, U, ...were assigned to the first subscale, items 
ranked 2, 7, 12, ...to the second oubscale, etc. This procedure was 
followed in order to make the subscales as comparable as possible. A dis- 
similar or ranked condition was produced by assigning items to subtests so 
as to maximize the difference between subtests in terms of average item mean. 
For example, under a two subtest situation the ''lowest" half of the items 



were assigned to one subtest and the " largest »• half to the other subtest. 
Likewise, in the five subscale case, the "lowest" fifth of the Items were 
assigned to the first subscale, the next fifth to the second subscale, and 



so on. 



Identified tn Table 1 are the matrix sampling conditions Investigated 
In the present study. The table also shows the average Item mean for those 
Items ccuprlslng each subtest under both the matched and ranked conditions. 



Place Table 1 About Here 



The assignment o^ students was accomplished by systematic sampling pro- 
cedjres. First of ell, the order in which student data records appear on 
the 1975 grade 5 assessment data tape is essentially random. All records 
for the students within a particular school were located together. Student 
data for 500 elementary schools (approximately 31,000 students) were con« 
tained on the tape. In the two subtest/ two- subgroup condition, students 
were assigned alternately to the first, then the second subgroup. In the 
five-subtest/five- subgroup case, students were similarly assigned. This 
procedure approximates the method that EQA would use in practice. That is, 
in testing a large group of students "subtest packages" would be prepared 
and interspersed. For exf-^mple, if there were two subtests, every other 
student would receive the same subtest package* 

A FORTRAN IV computer program was written by the author to produce, for 
each school building, an estimated mean developed from a composite of the 
separate subtest means. 



A 

An estimate of a school mean, X via multiple matrix sampling is given by: 
A 

8 



The symbol ig the mean score for a given subtest which Is found by: 

n 

= Z \ /n 

^ iml 

In the above foraula, n refers to the number of students taking the subtest 
and ^^is the suinmated score for the ith examinee on the tth subtest, 

k 

where, k s number of iteoos in the subscale 
^d, =: an item score 

An actual school mean was also computed in the conventional manner. 
These results were printed, then the next school's data was read and the 
process repeated. After results were printed for all 500 elementary schools, 
the program computed the standard error of estimate by averaging the sum of 
the squared deviations of the estimated school mean from the actual school 
mean. Also, computed was the correlation between the actual and estimated 
school mean. Percentile ranks, derived from the grade 5 statewide norm 
sample, were assigned to each school's estimated and actual mean scale norm 
score. The frequency and percent of schools having a certain sized discrep- 
ancy and percent of schools having a certain sized discrepancy between the 
two percentile ranks were computed and printed. 



9 

-6- 



Results and Discussion 

Comparisons among the conditions Investigated were m.,de on the baals of 
the correlation between actual and estimated school means, standard error 
and frequency of deviant percentiles. 

Displayed in Table 2 are the correlations between actual and estimated 
school means for each case investigated. Note the gradual decrease In the 
magnitude of r cs the number of subtests Increases. Such a result should be 
anticipated since the error in estimating a school mean will also Increase 
as a fixe:! number of items is partitioned into more and more subtests. With- 
out exception the r observed for the matched condition is higher in magnitude 
than r obtained in the ranked condition. The effect is highly consistent 
although the difference between r's for the matched and ranked condition is 
not statistically significant in any of the cases. 



Place Table 2 about here 



Summarized in Tables 3 through 7 are the standard errors and proportion 
of schools with given percentile differences for matched and ranked condi- 
tions. Data Is presented for four categories of grade enrollment as well as 
the total sample of 500 schools. Table 3 contains results on both conditions 
for self esteem. Because of the greater number of subtests examined for 
verbal and composite achievement, results for matched and ranked conditions 
are given in separate tables. 



Place Tables 3-7 about here 



One can readily note the Increase in estimation error resulting from an 
increase in the number of subtests. It should be remembered that with each 
increase in the number of subtests^ t^ere was a corresponding decrease in the 



number of items forming a subtest, hence, a decrease in the number of 
observations comprising the estimated school mean. Also consistent with 
statiacical expectation is the increase In estimation error associated with 
a decrease in grade enrollment. The lattwr nffect is a significant one for 
a state assessment program when one consider.? the large number of schools 
with small grade enrollments. In the prestnt sample about 125 of the 500 
elementary schools (20 per cent) have a 5th grade enrollment oJ; 30 or less. 
Thus it is imperative to examine the estimated amount of error for schools 
of various grade enrollments as well for the total sample. 

Returning to a comparison of the matched and ranked items assignment 
conditions, a perusal of the 12 parallel cases exhibited in Tables 3-7 
reveals a highly consistent picture of smaller error estimates for the 
matched condition. As one should expect, casting the data in the form of 
percentile differences leads to the same pattern of superiority in favor of 
the matched condition. In summary, the matched condition demonstrated a 
more favorable profile (higher correlations, smaller standard error and lower 
frequency of deviant percentiles) than the ranked or dissiiailar condition 
for all cases studied. Support was thereby obtained for establishing subtests 
that are very similar to one another in terms of average item mean. However, 
it should be remembered that the effect displayed is one of extremes. The 
ranked condition may be regarded as the least desirable method for allocating 
items to subtests. Such a condition is unlikely to occur in practice, but 
the results do help to define an "upper bound" of error. In the absence of 
stable item analysis information to first stratify items according to mean 
score or other statistical properties, one would allocate items by random 
assignment. Simple random assignment should assure a similar composition 
of Items across subtests, especially when the number of items per subtest 

11 

-8- 



i« large, and thereby achie/e an essentially matched condicion. 

While one can readily observe that the standard error increased with 
an increase in subtests and with a decrease in grade enrollment, how coes 
one judge the amount of tolerable error? In retrospect it might have been 
better to score the achievement tests in terms of proportion of correct 
answers rather than number correct. This would standorH-^ze reporting across 
tests. Establishing an acceptable range of error might be accomplished more 
readily since the metric itself is easily understood. In the case of a non- 
cognitive scale with a Likert type format, a sumraated score is more obscure, 
unless the items are dichotomously scored. Then scoring could take the 
form of proportion of items answered in the positive direction. To be under- 
stood by non-siatistically oriented individuals who must make policy 
decisions regarding a large scale assessment program, it .seemed reasonable 
to translate the data into terms which might be more readily apprehended. 
Considering the results in terms of the difference in percentile rank attained 
by the school's actual and estimated mean, was an effort at getting a 
picture of the simulation in a context familiar to the assessment program's 
policy makers. 

Consider the matched condition results shown in Table 6 for composite 
achievement. Suppose that a percen* .ie difference of + 0 to 10 points 
represented a "tolerable" range r error. LocVing first at the matched 
condition for five subtests and combining the 0-5 and 6-10 categories, 
we find that 69 percent of the schools having 30 or fewer 5th grade students 
fall in the acceptable error range while 84 percent of the schools with 31 - 
60 students reach the acceptable range. With only two subtests, one could 
expect 96 percent and 99 percent reaching the tolerable range for these two 
enrollment categories. Compare these results wi'th those obtained on an 

12 

-9- 



instrument containing half as many items. In Table 4 only 55 percent and 
76 percent of the schools in the lower two enrollment categories reach the 
acceptable range when there are five subtests. With two subtests the 
situation improved to 85 percent and 96 percent. 

In evaluating the results for the Pennsylvania program, there was a 
concern that approximately 90 percent of the schools achieve estimated means 
deviating by no more than 10 percentile points. Thus, a tentative decision 
was made to develop a grade 5 test package having two or possibly three 
subtests per goal area. Even two or three subtests will yield a substantial 
savings in test taking time. 

This study would appear to lend confidence to the use of multiple matrix 
sampling techniques in large scale assessment programs when the major thrust 
is providing school building information as a service function as opposed to 
simply obtaining statewide aggregates for presentation to the state legis- 
lature, vrhen an assessment program relies at least partially on a norm 
referenced model of reporting data, the estimates of a school mean must have 
a sufficiently low error so results are acceptable to school people. Any 
large scale testing program considering multiple matrix sampling would find 
simulation profitable in formulating guidelines for tailoring procedures to 
the specific parameters of the program such as number of test items, type 
of reporting unit and a host of other considerations. 



13 



-10- 



References 



!• Pennsylvania Department of Education. Educational Quality Assessment 

In Pennsyl vania; The First Six Years , Harrlsburg, Pennsylvania, 1973. 

2. Pennsylvania Department of Education. Educational Quality Assessment 

Manual fo r Interpreting Elementary School Reports . Har rl sburg , 
Pennsylvania, 1974. 

3. Pennsylvania Department of Education. Getting Inside the EQA Inventory : 

Grade 5 . Harrlsburg, Pennsylvania, 1975. 

A. Coopersmlth, S. The Antecedents of Self -Esteem . San Francisco, Cali- 
fornia: Freeman, 1967. 



I 



14 
-II- 



Table 1 



Mean Values for Items Comprising Each Form 
Examined In the Matched and Ranked Conditions 







Self 


Esteem 


Verbal 


Composite 


Subtests 




Achievement 


Achievement 




Matched 


Ranked 


Matched 


Ranked 


Matched 


Ranked 


2 


1 


1.74 


1.52 


0.59 


0.44 


0.61 


0.44 




2 


1.71 


1.93 


0.60 


0.76 


0.61 


0.79 


3 


1 






0.59 


0.39 


0.62 


0.39 




2 






0.60 


0.60 


0.61 


0.62 




3 






0.61 


0.81 


0.61 


0.84 


4 


1 


1.72 


1.41 






0.61 


0.36 




2 


1.73 


1.S3 






0.61 


0.53 




3 


1.73 


1.80 






0.62 


0.71 




4 


1.72 


2.05 






0.62 


0.87 


5 


1 


1.74 


1.39 


0.59 


0.35 


0.61 


0.34 




2 


1.70 


1.57 


0.61 


0.46 


0.62 


0.48 




3 


1.74 


1.73 


0.61 


0.61 


0.62 


0.62 




4 


1.72 


1.84 


0.60 


0.73 


0.62 


0.75 




5 


1.73 


2.09 


0.59 


0.85 


0.61 


0.88 


6 


I 






0.58 


0.34 


0.61 


0.32 




2 






0.61 


0.44 


0.61 


0.46 




3 






0.60 


0.52 


0.61 


0.56 




4 






0.61 


0.68 


0.62 


0.68 




5 






0.60 


0.75 


0.61 


0.78 




6 






0.60 


0.86 


0.62 


0.89 



15 

ERIC 



Table 2 



Correlations Between Actual and Estimated 
School Means for Each Case Investigated 



Subtests 



Self Esteem 



Verbal 
Achievement 



Composite 
Achievement 



2 


.986 


.968 


.985 


.978 


.989 


.983 


3 






.973 


.960 


.980 


.972 


4 


.941 


.934 






.973 


.961 


5 


.937 


.898 


.938 


.920 


.956 


.948 


6 






.928 


.914 


.943 


.935 



% 



16 

ERIC 



Table 4 



Multiple Matrix Sampling Sijaulation 
Matched Conditions, Verbal Achievement, 30 Items 



Number Grade Proportion of Schools With 



of 


Enroll- 




Std. 


Given Percentile 


Differences 


Subtests 


ment 


N 


Error 


0-5 


6-10 


11-15 


16-20 21+ 


2 


1-30 


124 


0.413 


.65 


.20 


.10 


.02 ,02 




31-60 


173 


0.347 


.74 


,21 


,05 


.00 ,00 




61-90 


129 


0.257 


.84 


.14 


.02 


.00 .00 




91- 


74 


0. 186 


.93 


.07 


,00 


.00 .00 




TOTAL 


500 


0.326 


.77 


,17 


,04 


.01 ,61 



3 



1-30 


124 


0.549 


.58 


.25 


.08 


,05 


,03 


31-60 


173 


0.456 


,68 


,19 


,08 


.03 


,02 


61-90 


129 


0.392 


.68 


.22 


.08 


.01 


.00 


91- 


74 


0.255 


,82 


, 12 


.05 


,00 


,00 


TOTAL 


500 


0.444 


.68 


.20 


.08 


,02 


,02 



1-30 


124 


1.045 


,37 


,18 


.16 


,12 


,16 


31-60 


173 


0,626 


,51 


,25 


.13 


.07 


.04 


61-90 


129 


0.507 


.63 


.21 


.09 


.04 


,03 


91- 


74 


0.378 


.64 


.23 


.12 


,01 


.00 


TOTAL 


500 


0.701 


.53 


.22 


,13 


.07 


.06 



1-30 


124 


1.007 


.45 


.18 


.05 


.08 


.24 


31-60 


173 


0.742 


.4^ 


.25 


.14 


.10 


.08 


61-90 


129 


0.594 


.5S 


.18 


.14 


.04 


.05 


91- 


74 


0.406 


.64 


,24 


.12 


.00 


.00 


TOTAL 


500 


0.750 


,51 


.21 


.12 


.07 


.10 



17 



Table 3 

Multiple Matrix Sampling Simulation, Matched and 
Ranked Conditions, Self Esteem, 40 Items 



Number Grade Proportion of Schools With 

of Enroll- std. Given Percentile Differences 

Subtests oent N Error 0-5 6-10 11-15 16-20 21+ 

2 1-30 125 0.966 .66 .25 .07 .02 .00 

„ 31-60 174 0.839 .73 .15 .09 .03 .01 

Matched 61-90 128 0.584 .85 .10 .03 .02 .00 

91- 73 0.438 .89 .05 .05 .00 .00 

TOTAL 500 0.776 .77 .15 .06 .02 .00 



1-30 125 1.517 .49 .19 .20 .02 .10 

31-60 174 1.092 .52 .22 .13 .07 .05 

Ranked 61-90 128 0.808 .51 .27 .13 .05 .04 

91- 73 0.777 .48 .25 .21 .07 .00 

TOTAL 500 1.117 .50 .23 .16 .05 .04 



4 1-30 125 2.249 .46 .16 .15 .07 .16 

31-60 174 1.493 .51 .25 .09 .11 .05 

Matched 61-90 128 1.182 .57 .23 .13 .05 .01 

91- 73 0.736 .64 .25 .11 .00 .00 

TOTAL 500 1.570 .53 .22 .12 .07 .06 



1-30 125 2.259 .41 .19 .16 .06 .18 

31-60 174 1.651 .41 .24 .14 .08 .12 

Ranked 61-90 128 1.156 .49 .23 .16 .07 .05 

91- 73 1.084 .45 .33 .14 .04 .04 

TOTAL 500 1.659 .44 .24 .15 .07 .10 



5 1-30 125 2.382 .49 .10 .13 .10 .18 

31-60 174 1.518 .51 .25 .09 .09 .06 

Matched 61-90 128 1.149 .55 .23 .13 .08 .01 

91- 73 0.830 .67 .16 .11 ,04 .01 

TOTAL 500 1.627 .54 .20 .11 .08 .07 



1-30 125 2.931 .36 .22 .10 .07 .24 

31-60 174 2.031 .38 11 .17 .11 .17 

Ranked 61-90 128 1.414 .51 .2 .13 05 10 

91- 73 1.118 .42 .it, .19 !o4 .'o8 

TOTAL 500 2.064 .41 .21 .15 .08 .16 



18 



Table 5 



Multiple Matrix Sampling Simulation 
Ranked Conditions, Verbal Achievement, 30 Items 



Number Greule 



of 


JIolvO JL JL- 




otu. 


Qllht' Afl^ a. 


men t 


N 


Etvoc 


2 

mm 




1 OA 


n coo 




31-60 


173 


0.403 




61-90 


129 


0.318 




91- 


74 


0.265 




TOTAL 


500 


0.403 


3 


1-30 


124 


0.735 




31-60 


173 


0.545 




61-90 


129 


0.430 




91- 


74 


0.360 




TOTAL 


500 


0.551 



Pxx)portion of Schools With 
Given Percentile Differences 



0-5 


6-10 


11-15 


16-20 


21+ 


.55 


,27 


.10 


.05 


.03 


.68 


,22 


,08 


.02 


,00 


.77 


,20 


.03 


.00 


.00 


.77 


.20 


.03 


.00 


,00 


.68 


.22 


,06 


,02 


.01 



.50 


,24 


, 10 


,07 


.09 


.61 


.22 


.10 


.04 


.03 


.67 


.24 


.07 


,02 


,00 


.66 


.19 


.14 


.01 


,00 


,60 


,23 


.10 


.04 


,03 



1-30 


124 


1.093 


.34 


,24 


.13 


, 10 


,19 


31-60 


173 


0.737 


,43 


.28 


, 14 


.06 


,09 


61-90 


129 


0.541 


.57 


.20 


.13 


,06 


.03 


91- 


74 


0.476 


,58 


,23 


.12 


.05 


,01 


TOTAL 


500 


0,770 


.47 


.24 


. 13 


.07 


,09 



6 



1-30 


124 


1.099 


,35 


.29 


.10 


.08 


.18 


31-60 


173 


0.808 


.47 


.24 


.10 


.10 


.09 


61-90 


129 


0.643 


.46 


.26 


.13 


.07 


.08 


91- 


74 


0.574 


.59 


.22 


. 12 


.04 


,03 


TOTAL 


500 


0,824 


.46 


.25 


.11 


.08 


.10 



Table 6 



Multiple Matrix Sampling Simulation 
Matched Conditions, Composite Achievement, 60 Items 



Number 


Grade 








Proportion of Schools With 


of 


Enroll- 




Std. 


Given Percentile 


Differences 


Subtests 


ment 


N 


Error 


0- s 


A in 


11 1 c; 


1 ti OA 


21 + 


2 


1-30 


124 


Q-661 


.73 


.23 


.04 


.01 


.00 




31-60 


173 


0.457 


.85 


.14 


.01 


.00 


.00 




61-90 


128 


0.353 


• 


. UJ 


.UU 


.00 


.00 




91- 


75 


0.309 




. U JL 


. UU 


. 00 


.00 




TOTAL 


500 


0.475 


ft7 


1 9 


ni 

• Ul 


. 00 


.00 


3 


1-30 


I2t+ 


0.831 


.66 


.22 


.07 


.04 


.01 




31-60 


173 


0.689 


.71 


.19 


.09 


.01 


.00 




61-90 


128 








.UJ 


. 00 


.00 




91- 


7,5 


0,361 


91 


HQ 

. \3y 


. UU 


.UU 


.00 




TOTAL 


5iK> 


0.648 


75 


in 

• JLO 




n 1 
. Ul 


. 00 


A 
H 


i-30 


.124 


0.992 


.56 


.26 


.10 


.02 


.05 




31-60 


173 


0.758 


.69 


.20 


.08 


.03 


.00 




61-90 


126 


0.656 


. / o 




. uo 


.00 


.01 




91- 


75 


0.465 




1 7 


nf\ 
. UU 


.00 


.00 




TOTAL 


500 


0.764 




9/1 


nT 
. u/ 


no 


.01 




1-30 


124 


1.297 


.48 


.21 


. I'V 


. 10 


.07 




31-60 


173 


1.016 


.57 


.27 


.03 


,04 


.04 




61-90 


128 


0.739 


.70 


.19 


.07 


.03 


.01 




91- 

^ JL — 




U • OH£. 


.71 


.20 


.08 


.01 


.00 




TOTAL 


500 


0.987 


.60 


.22 


.09 


.05 


.03 


6 


1-30 


124 


1.680 


.34 


.28 


.17 


.09 


.12 




31-60 


173 


1.021 


.56 


.24 


. 10 


.08 


.01 




61-90 


128 


0.830 


.73 


.14 


.01;^ 


.03 


.01 




91- 


75 


0.594 


.69 


.25 


.05 


.00 


.00 




TOTAL 


500 


1.137 


.57 


.23 


. 11 


.06 


.04 



20 



Table 7 



Multiple Matrix Sampling Simulation 
Ranked Conditions, Composite Achievement, 60 Items 



Number 


Grade 






Proportior 


of Schools tfith 


of 


Enroll- 




Std. 


Given Percentile Differences 


Subtests 


ment 


N 


Error 


n 5 

U-3 


C 1 n 

0- 10 


11- 15 


16-20 


21+ 


2 


1-30 


124 


0.851 


.61 


.27 


.08 


.02 


.01 




31-60 


173 


0.594 


.79 


.18 


.03 


.00 


.00 




61-90 


128 


0.448 


.0/ 


1 n 


.03 


.00 


.00 




91- 


75 


0.384 


OO 

• 


• 08 


.00 


.00 


.00 




TOTAL 


500 


0.614 


. /o 


. 17 


.04 


.01 


.00 


3 


1-30 


124 


1.075 


.50 


.31 


.12 


.02 


.05 




31-60 


173 


0.804 


.70 


.21 


.06 


.03 


.01 




61-90 


128 


0.614 


.79 


. 17 


.03 


.01 


.00 




91- 


75 


0.554 


.7j 


.23 


.03 


.01 


.00 




TOTAL 


500 


0.810 


• OO 


OO 


.06 


.02 


.01 




1-30 


124 


1.251 


.46 


• 26 


.12 


. 10 


.06 




31-60 


173 


0.947 


.55 


.29 


.09 


.05 


.02 




61-90 


128 


0.664 


.79 


. 14 


.05 


.02 


.00 




91- 


75 


0.601 


on 
• oU 


. lo 


.01 


.03 


.00 




TOTAL 


500 


0.929 




. 23 


.08 


.05 


.02 


5 


1-30 


1S4 


1.495 


Ai 


. 2o 


. 18 


.04 


.11 




31-60 


173 


1.044 


.58 


.21 


.09 


.07 


.05 




61-90 


128 


0.801 


.66 


.20 


.10 


.03 


.01 




91- 


75 


0.670 


.73 


.21 


.04 


.00 


.01 




TOTAL 


500 


1.079 


.58 


.22 


.11 


.04 


.05 


6 


1-30 


124 


1.598 


.45 


.20 


.15 


. 10 


.10 




31-60 


173 


1.322 


.54 


. 18 


.13 


.06 


.08 




61-90 


128 


0.808 


.69 


.19 


.07 


.04 


.02 




91- 


75 


0,719 


.63 


.29 


.07 


.01 


.00 




rOTAL 


500 


1.217 


.57 


.20 


.11 


.06 


.06 



